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Social  Networks 

►  A  model  of  the  relationships  between  entities. 

►  Also  used  to  study  insurgent  groups,  terrorist  cells,  etc. 

►  Relates  actors  (nodes  in  the  network)  through  relationships 
(edges  in  the  network). 

►  Typically  used  for  small  groups,  with  full  knowledge  of  all 
links. 


Marriage  Network 


Family _ Wealth  Betw.  Elgenv.  Degree 


Wealth  &  Betweenness  c.  0.3512 

Wealth  i  Eigenvector  c.  0.5365 


Covert  Networks 


►  Actors  have  a  vested  interest  in  not  being  observed. 

►  Networks  may  be  very  large. 

►  The  networks  change  in  time. 

►  Some  links  are  known  to  be  there,  some  known  to  be 
missing,  but  others  are  unknown. 

►  An  actor  may  try  to  hide  (change  email  address,  change 
phone  number,  start  calling  themselves  Colonel  Guapa). 


Methodology 


►  Assume  the  existence  of  a  “social  space”  S  which  controls 
the  structure  of  the  network. 

►  The  probability  of  an  edge  in  the  network  is  a  function  of 
the  “closeness”  of  the  nodes  in  S. 

►  The  social  space  provides  a  framework  from  which 
inference  can  be  performed. 


Social  Space 


►  Early  work  reported  by  Hoff  et  al  in  JASA. 

►  Model  based  on  location: 

►  Probability  of  an  edge  between  v,  and  Vj  a  function  of  their 
distance  in  social  space. 

►  Several  variations  proposed. 

►  Versions  of  the  Exponential  Random  Graph  Models 
(ERGMs)  (Hunter  et  al,  JASA  2008)  can  be  thought  of  in 
terms  of  a  “social  space”. 

►  We  will  discuss  a  “social  space”  model  that  has  a  simple 
least  squares  algorithm  for  fitting  the  parameters,  which 
can  be  used  on  large  graphs  (thousands  to  tens  of 
thousands  of  nodes  or  more). 
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Graph  Definitions 


►  A  graph  is  a  pair  ( V ,  E)  where  V  is  a  set  (vertices)  and  E 
is  a  collection  of  unordered  pairs  of  vertices  (edges). 

►  We  can  consider  directed  graphs  ( V ,  A)  where  A  (arcs  or 
arrows)  are  ordered  pairs. 

►  The  order  of  the  graph  is  |  V\  and  the  size  of  the  graph  is 
\E\  (or  \A\  in  the  case  of  directed  graphs  (digraphs)). 

►  Vertices  are  sometimes  called  “nodes”  or  “actors”. 

►  Edges  are  sometimes  called  “links”  or  “relations”. 

►  The  adjacency  matrix  A  =  ( a,y )  is  the  |  V\  x  \  V\  binary 
matrix  with  a  1  in  those  places  where  an  edge  occurs  in 
the  graph. 


Probabilistic  Framework 


►  We  place  a  probability  structure  on  the  network. 

►  This  means  we  fit  a  generative  model  to  the  graph. 

►  This  allows  us  to  estimate  the  probability  of  a  missing 
(unknown)  link. 

►  We  can  bring  node  attributes  into  the  model. 

►  We  are  essentially  choosing  the  “most  likely”  graph  given 
the  model  assumption  and  the  observed  edges. 


Random  Dot  Product  Graphs 


►  Each  vertex  v-,  has  associated  with  it  a  vector  x,. 

►  Place  an  edge  v,vy  between  vertices  v,  and  vy  with 
probability  proportional  to  x,xy,  the  dot  product  of  x(-  and  xy. 

►  Thus  pij  =  f(XjXj).  We’ll  use  the  threshold  function  for  f\ 

f  0  x  <  0 

f{x)  =  <^  x  0  <  x  <  1 

[  1  X  >  1 

►  The  edges  in  the  random  graph  are  no  longer  independent. 

►  We  need  to  estimate  the  x,  from  the  observed  graph. 

►  We  can  extend  the  model  to  directed  graphs  by  having  in- 
and  out-vectors  xj  and  xp  with  p/y  proportional  to  x?xj. 


<s 


►  Each  vertex  v,  has  associated  with  it  a  vector  x,  e  S. 

►  The  proximity  (as  measured  by  the  dot  product)  of  two 
vectors  controls  the  probability  of  an  edge. 

►  Thus  S  is  the  space  which  defines  the  random  graph  that 
we  observe. 

5  a 


Linear  Algebra  (Least  Squares) 


Note  that  if  we  want  to  find  the  vectors  U  which  best  “match”  the 
adjacency  matrix  A  (best  in  Frobenius  norm),  then  the  singular 
value  decomposition:  A  =  UDV'  almost  works  (the  problem  is 
the  diagonal).  Note  that  for  graphs  A  is  symmetric,  so  V  =  U. 

1 .  Set  D  =  diag(O). 

1.1  s  =  svd(/\  +  D). 

1 .2  X  =  s$U,  scaled  by  the  singular  values. 

1.3  D  =  diag(XX'). 

2.  Repeat  1-3  until  convergence. 

3.  Return  X. 


The  Enron  Data 


►  Graphs  (directed  graphs)  of  emails  between  executives  at 
Enron. 

►  184  email  addresses  (nodes). 

►  150  executives  (names). 

►  187  weeks. 

►  Each  graph  corresponds  to  1  week  of  emails. 

►  An  edge  v  — >  w  if  there  was  an  email  from  v  to  w  within 
the  week. 

►  Note:  we  are  ignoring  multiple  emails  and  an  email  from 
one  to  many  generates  a  “star”  of  edges. 


An  Alias 


EMPLOYEE  (E-MAIL  ADDRESS)- 


MxapMngm*, 


The  analysis  detected 
an  anomaly:  a  new  e- 
mail  address  for  this 
person,  who  had  been 
"phillip.allen"  lor  131 
previous  weeks. 


Company  leaders  e-mail 

|n«l* 

less  frequently,  leaving 

some  communication  to 

Jrfmc 

subordinates. 

kenneth.lay 


Finding  Patterns 
In  Corporate  Chatter 

Computer  scientists  are  analyzing  about  a  half  million  Enron  e-mails.  Here  is  a  map  of  a  week's  e-mail  patterns  in  May  2001 , 
when  a  new  name  suddenly  appeared.  Scientists  found  that  this  week's  pattern  differed  greatly  from  others,  suggesting 


arp  that  minht  in 


The  Alias 


►  k.. alien  did  not  appear  in  any  prior  graph. 

►  Perusal  of  the  content  of  the  emails  determines  that  these 
were  sent  by  Phillip  Allen. 

►  phillip.allen  appears  in  the  previous  graphs. 

►  A  matched  filter  comparing  neighborhoods  was 
implemented  and  it  found  the  correct  match. 

►  In  this  work,  we  develop  a  “social  space”  version  of  the 
matched  filter. 


Outline 


Motivation 


Definitions  and  Model 


Alias  Identification 


Conclusions 


Aliases 


►  Given  two  graphs  Gt  and  Gt+ 1 . 

►  Suppose  we  know  some  of  the  vertices  are  shared  by 
these  graphs  (and  which  ones  they  are). 

►  There  is  one  vertex  in  Gf+1  that  we  have  not  seen  before. 

►  Assuming  that  this  vertex  appeared  in  Gt  with  a  different 
label,  can  we  determine  this  vertex? 


Aliases 


►  Setup: 

►  Two  graphs,  Gt  =  ( V  u  Ut,  Et)  and  Gf+i  =  ( V  U  Ut+i ,  Ef+1 ). 

►  All  vertices  are  labeled  (email  addresses). 

►  Vertices  in  V  are  named  (individual  associated  with  the 
address). 

►  Vertices  in  L/(  are  not  named. 

►  Want  to  associate  the  names  to  the  vertices  in  Ut+ 1 . 


Methodology 


►  Assign  the  name  to  vertex  u  whose  vector  is  closest  to 
the  vector  xu. 

►  Optimize: 


(X,YuY2) 


arg  min 

x,yuy2 


X 

Y, 


- A i 


+ 

F 


X 

y2 


►  Mq  means  M  with  the  diagonal  replaced  with  zeros. 

►  Thus,  we  are  attempting  to  fit  a  set  of  vectors  to  the  known 
and  a  set  each  for  the  unknown  in  the  two  graphs.  Fitting 
to  the  knowns  constrains  the  V,  to  lie  in  the  same  space. 


The  Setup 


►  Input  A-i ,  A2,  the  adjacency  matrices  of  the  graphs 
corresponding  to  the  vertices  (l/,  LI,). 

►  Set  B  to  be  the  average  of  A^[V]  and  A2[V],  the  blocks 
corresponding  to  V. 

►  Set  N  —  n  +  +  n2. 


►  Set  A  to  be  the  N  x  N  matrix  with  first  n  x  n  block  equal  to 
B,  and  blocks  A[V,  Uj]  =  Ah  A[Uh  V]  =  At-r 


(  A:IV,V\+AZ[V,V]  a^V  Ua]  A2[V,U2\\ 


A  = 


MUi,v] 
V  A2[U2,  V] 


MUi,u:]  r 

y  a2\u2iu2\) 


where  Y  is  the  dot  product  of  vectors  derived  from  U- \  and  U2. 


Fitting  the  Alias 


1 .  Setup  as  described  previously. 

2.  Set  D  =  0 nxn- 

3.  Set  the  first  nxn  block  of  D  equal  to  the  the  dot  product  of 
the  result  of  running  the  least  squares  Algorithm  on  B. 

3.1  While(Not  Converged) 

3.2  Y  =  gd{A  +  D ) 

3.3  Set  the  unknown  entries  of  D  (such  as  those  corresponding 
to  U i  x  U2)  to  the  dot  products  of  the  appropriate  parts  of  Y. 

4.  Output/ 

►  Use  the  vectors  to  find  the  alias:  closest  named  vector  to 
the  one  associated  with  the  alias. 


Distance 


Alias  Identification:  k.. aliens  phillip.allen 


Enron  Executive 


Cartoon 


Social  Space 


•  • 


t 


Who  am  I? 


•  • 
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Conclusions 


►  Social  space  provides  a  mechanism  for  modeling  and 
inference  on  graphs  and  time  series  of  graphs. 

►  Dot  product  graph  model  is  simple,  but  easy  to  fit  using 
linear  algebra. 

►  Sparse  matrix  approaches  can  make  this  efficient: 

►  There  appears  to  be  an  0(ns),  2  <  s  <  3  matrix  multiply  in 
the  algorithm,  in  order  to  determine  the  stopping  criterion 
(compute  the  error). 

►  Some  tricks  can  be  played  to  reduce  this  for  this  application. 

►  By  using  only  the  change  in  the  diagonal  for  determining 
convergence,  we  eliminate  the  need  for  the  full  matrix 
multiply,  replacing  it  with  an  O(n)  operation.  Note  that  we 
only  need  to  check  the  diagonal,  since  once  this  stops 
changing  the  algorithm  produces  a  fixed  point. 

►  It  is  possible  to  add  covariates  (measurements  at  the 
nodes)  into  the  model  and  still  use  the  linear  algebra 
approach,  but  this  work  is  preliminary. 


Questions? 


Contact  Information:  dmarchette@gmail.com 


